COS 424 : Interacting with Data
نویسنده
چکیده
The previous lecture defined the Nearest Neighbor Algorithm and discussed how it suffers from the curse of dimensionality. This means that as the number of dimensions increase the Nearest Neighbor algorithm performs poorer and poorer. To better understand the curse of dimensionality with regard to the Nearest Neighbor algorithm one must understand what higher dimensions look like. The following discussion demonstrates how higher dimensions ( n >> 3) are qualitatively different from lower dimensions (2 or 3).
منابع مشابه
COS 424 : Interacting with Data
This course is about data! Data is everywhere. Everything is computerized now and vast amounts of data can be easily stored. Concomitant with vast amounts of data is the belief that this data will be useful. There are many practical issues regarding data which this course will not cover such as storing data, databases, transferring data, etc. This class will be concerned with how to get the mos...
متن کاملCOS 424 : Interacting with Data
2 Classification Error Suppose that we cut off the growing process at various points over the growing processs, and we evaluate the error of the tree at that point and time. This would lead to a graph of size vs. error (where error is the probability of making a mistake). There are two error rates to be considered: • training error (i.e. fraction of mistakes made on the training set) • testing ...
متن کاملCOS 424 : Interacting with Data
This lecture covers the basics of core concepts in probability and statistics to be used in the course. These include random variables, continuous and discrete distributions, joint and conditional distributions, the chain rule, marginalization, Bayes Rule, independence and conditional independence, and expectation. Probability models are discussed along with the concepts of independently and id...
متن کاملCOS 424 : Interacting with Data
We began the lecture with some final words on graphical models. Choosing a graphical model is akin to choosing a probability model for your data or choosing an algorithm. Each model has advantages and disadvantages that may make it more or less suitable for modeling your data. A graphical model is a representation of a probability model, so it also carries that model’s plusses and minuses. Grap...
متن کاملCOS 424 : Interacting with Data
In this problem, two types of data will be available. The first type of data we will have is called presence records. Presence records are pixels on the grid map where the species of concern was observed. The same pixel may be present multiple times if the species was observed more than one time within that pixel. The second type of data we will have is called environmental variables. Each envi...
متن کامل